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Abstract. We introduce a new method to reconstruct the quantum matrix p 
of a system of n-qubits and estimate its rank d from data obtained by quantum 
state tomography measurements repeated m times. The procedure consists 
in minimizing the risk of a linear estimator p of p penalized by given rank 
(from 1 to 2 n ), where p is previously obtained by the moment method. We 
obtain simultaneously an estimator of the rank and the resulting state matrix 
associated to this rank. We establish an upper bound for the error of penalized 
estimator, evaluated with the Frobenius norm, which is of order dn(A/3) n /m 
and consistency for the estimator of the rank. The proposed methodology is 
computationnaly efficient and is illustrated with synthetic and real data sets. 
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1. Introduction 



The experimental study of quantum mechanical systems has made huge progress 
recently. Producing and manipulating large quantum mechanical systems is easier 
to achieve. Following such an experimental setup, we suppose that we have a 
source of quantum systems which are identically prepared in some state. The goal 
is to reconstruct this state via Quantum State Tomography (QST). The QST is 
an experimental process where each system is repeatedly measured with different 
elements of a projector- valued measure (PVM). 

Most popular methods for estimating the state from such data are: linear inver- 
sion [35], maximum likelihood [3], [H], [S] and Bayesian inference [I], [2], [5] (see 
also references therein). Recently, different approaches brought up-to-date statis- 
tical techniques in this field. The estimators are obtained via minimization of a 
penalized risk. The penalization will subject the estimator to constraints. In |13| 
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the penalty is the Von Neumann entropy of the state, while [S], [TU] use the Li 
penalty to implement the Lasso matrix estimator, under the assumption that the 
state to be estimated has low rank. These last papers assume that the number of 
measurements must be minimized in order to recover all the information that we 
need. The ideas of matrix completion is indeed, that, under the assumptions that 
the actual number of underlying parameters is small (which is the case under the 
low-rank assumption) only a fraction of all possible measurements will be sufficient 
to recover these parameters. The choice of the measurements is randomized and, 
under additional assumptions, the procedure will recover the underlying density 
matrix as well as with the full amount of measurements (the rates are within log 
factors slower than the rates when all measurements are performed). 



In this paper, we suppose that a reasonable amount m (e.g. m — 100) of data is 
available from all possible measurements. We implement a method to recover the 
whole density matrix and estimate its rank from this huge amount of data. Indeed, 
more experiments deal nowadays with entanglement and the low-rank assumption 
does not correspond to such experiments. Instead, all measurements are imple- 
mented and for each one m independent, identically distributed (i.i.d.) random 
variables are available. Our method is relatively easy to implement and computa- 
tionally efficient. Its starting point is a linear estimator obtained by the moment 
method (also known as the inversion method), which is projected on the set of 
matrices with fixed, known rank. A data-driven procedure will help us select the 
optimal rank and minimize the estimators risk in Frobenius norm. We proceed by 
minimizing the risk of the linear estimator, penalized by the rank. When estimat- 
ing the state matrix of a rt-qubits system, our final procedure has the risk (squared 
Frobenius norm) bounded by a term of order dn(3 / 4) n / m, where d between 1 and 
2™ is the rank of the matrix. 

The inversion method is known to be computationally easy but less convenient 
than constrained maximum likelihood estimators as it does not produce a density 
matrix as an output. We revisit the moment method in our setup and argue that 
we can still project the output on the set of density matrices, with the result that 
the distance to the true state can only be decreased in the proper norm. 

Moreover, the rank of the large density matrix is an indicator of entanglement 
in the system. We shall indicate how to project the linear estimator of the state on 
the space of matrices with fixed, known rank. Finally, we shall select the estimator 
which fits best to the data in terms of a rank-penalized error. Additionally, the 
rank selected by this procedure is a consistent estimator of the true rank d of the 
state matrix. 

We shall apply our procedure to the real data issued from experiments on systems 
of 4 to 8 ions. Trapped ion qubits are a promising candidate for building a quantum 
computer. An ion with a single electron in the valence shell is used. Two qubit 
states are encoded in two energy levels of the valence electrons, see [3], |16j . 
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The structure of the paper is as follows. Section 2 gives notation and setup of the 
problem. In Section 3 we present the moment method. We first change coordinates 
of the state matrix in the basis of Pauli matrices and vectorize the new matrix. We 
give properties of the linear operator which takes this vector of coefficients to the 
vector of probabilities p(a, r) . These are the probabilities to get a certain outcome 
r from a given measurement indexed by a and that we actually estimate from 
data at our disposal. We prove the invertibility of the operator, i.e. identifiability 
of the model (the information we measure enables us to uniquely determine the 
underlying parameters). Section 4 is dedicated to the estimation procedure. The 
linear estimator will be obtained by inversion of the vector of estimated coefficients. 
We describe the rank-penalized estimator and study its error bounds. We study 
the numerical properties of our procedure on synthetic data and apply them to 
real-data, in Section 5. The last section is dedicated to proofs. 

2. Basic notation and setup 

We have a system of n qubits. This system is represented by a 2™ x 2™ matrix p, 
with coefficients in C. This matrix is Hermitian p* = p, semidefinite positive p > 
and has Tr(p) = 1. It is the "state matrix" which fully describes the system of n 
particles. The objective is to estimate p, from measurements of many independent 
systems, identically prepared in this state. 

For each system, the experiment provides random data from separate measure- 
ments cr x , tjy, a z on each mode, where we use here the classical notation for the 
Pauli matrices. The collection of measurements which are performed writes 

(1) {cr a = <r ai ® . . . ® o an , ae £ n = {x,y,z} n }, 

where a = (ai, . . . , a n ) is a vector taking values in £ n which identifies the experi- 
ment. 

The outcome of the experiment will be a vector r E lZ n = {—1, 1}™. It follows 
from the basic principles of quantum mechanics that the outcome of any experiment 
indexed by a is actually a random variable, say P a , and that its distribution is given 
by: 

(2) Vr e n n , P(P a = r) = Tr (p ■ P" 1 <g> • • • ® F^) , 

where the matrices denote the projectors on the eigenvectors of a ai associated 
to the eigenvalue n, for all i from 1 to n. 

For the sake of simplicity, we propose to introduce the notation 

P a :=P?*®---®P£. 

As a consequence we have the shorter writing for P(i? a = r) = Tr (p • P a ). 

The tomographic inversion method for reconstructing p is based on estimating 
probabilities p(a, r) := P(i? a = r) by p(a, r) from available data and solving the 
linear system of equations 

(3) p(a,r) = Tr(p--P a ). 
It is known in statistics as the method of moments. 
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We shall use in the sequel the following notation: = Tr(A*A) denotes the 

Frobenius norm and ||vl|| = sup^gjjd \ v \ 2 —i \Av\2 the operator sup-norm for any dx d 
Hermitian matrix A, \v\2 is the Euclidean norm of the vector v £ M. d . 

In this paper, we give an explicit inversion formula for solving Then, we 
apply the inversion procedure to equation ([3]) and this will provide us an unbiased 
estimator p of p. Finally, we project this estimator on the subspace of matrices of 
rank k (k between 1 and 2") and thus choose, without any a priori assumption, the 
estimator which best fits the data. This is done by minimizing the penalized risk 

||iZ- p\\ 2 F + v ■ rank(i?), 

where the minimum is taken over all Hermitian, positive semidefinite matrices R. 
This optimization program has an explicit and easy to implement solution. The 
procedure will also estimate the rank of the matrix which best fits data. We actually 
follow here the rank-penalized estimation method proposed in the slightly different 
problems of matrix regression. This problem recently received a lot of attention in 
the statistical community [7J H21 \T7\ [19] and Chapter 9 in [14]. Here, we follow the 
computation in [TJ. 

In order to give such explicit inversion formula we first change the coordinates 
of the matrix p into a vector p <G R 4 " on a convenient basis. The linear inversion 
also gives information about the quality of each estimator of the coordinates in p. 
Thus we shall see that we have to perform all measurements cr a in order to recover 
(some) information on each coordinate of p. Also, some coordinates are estimated 
from several measurements and the accuracy of their estimators is thus better. 

To our knowledge, this is the first time that rank penalized estimation of a 
quantum state is performed. Parallel work of Guta, Monz (private communication) 
addresses the same issue via the maximum likelihood procedure. Other adaptive 
methods include matrix completion for low-rank matrices [51 [HI EH Q~5] and f° r 
matrices with small Von Neumann entropy |13| . 

3. IDENTIFIABILITY OF THE MODEL 

A model is identifiable if, for different values of the underlying parameters, we 
get different likelihoods (probability distributions) of our sample data. This is a 
crucial property for establishing the most elementary convergence properties of any 
estimator. 

The first step to explicit inversion formula is to change coordinates and express 
matrix p on the basis provided by the measurements. More precisely, let us put 
M n = {I, x, y, z} n and 07 = /. For all b e M n , denote similarly to |T]) 

(4) {cr b = <r bl ® . . . ® a bn , be M n }. 

Then, we have the following decomposition: 

P= ^b'CTb, With P(7,...,J) = — • 
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We can plug this last equation into © to obtain, for a £ £ n and r £ lZ n , 
P(i? a = r) = Tr (p • P r a ) 

= Tr ( E Pb-<7b-p r a ) 

\bGA4" / 

beM" 

n 

= E Pbll^K^)- 

beM" 3=1 

Finally, elementary computations lead to Tr(IP*) = 1 for any s £ { — 1,1} and 
t £ {x, y, z}, while Tr(a t P* ) = s\ t =f for any s £ { — 1,1} and (t,t') £ {x,y, z} 2 . 

For any b £ M n , we denote by = {j £ {1, .. .,n} : bj = I}. The above 
calculation have proved the following result. 

Proposition 3.1. For a £ £ n , and r £ 7Z n , we have 

P(P a = r) = J2 (*>■ II r i l i a i= h i)- 

Let us consider, for example, b = (x, . . . , x), then the associated set E^, is empty 
and P(p( x >---> x ) = r) is the only probability depending on P( x ,...,x) among other 
coefficients. Therefore, only the measurement (a x , . . . ,a x ) will bring information on 
this coefficient. Whereas, if b = (I, I, x, . . . , x), the set E^ contains 2 points. There 
are 3 2 measurements ((o~ x , cr x ), (<r z , a z , a x , cr x )) that will bring partial 
information on pb- This means, that a coefficient pb is estimated with higher 
accuracy as the size of the set Ei, increases. 

For the sake of shortness, let us put in vector form: 

P '•= (Pb)beM™ 

P := (P(r,a)) {r>a)e{n „ x£n) = (nR* = r))( P> a)6(W»x£»)- 

Our objective is to study the invertibility of the operator 

K 4 " -> M 6 " 
P i-> P- 

Thanks to Proposition 13. 11 this operator is linear. It can then be represented by 
a matrix P = [P( r ,a),b](r,a)e(R»x£»),beM». we wiu tnen lmve: 

(5) V(r,a)€(R"xn p (r , a) = ]T p b P( r ,a),b 

beX" 

and from Proposition 13.11 we know that 

We want to solve the linear equation Pp = p. Recall that E^ is the set of indices 
where the vector b has an / operator. Denote by d(h) the cardinality of the set 
Eb- 
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Proposition 3.2. The matrix P T P is a diagonal matrix with non-zero coefficients 
given by 

(P T P)b, b = 3 d(b) 2". 

As a consequence the operator is invertible, and the equation Pp = p has a unique 
solution: 

p = (P T P)- 1 P T p. 

In other words, we can reconstruct p — {pb)beM n from p, in the following way: 

P h = 3 d(b)2« H P(r,a)P(r,a),b- 

(r,a)G(7Z"xf ") 

This formula confirms the intuition that, the larger is d(b), the more measurements 
cr a will contribute to recover the coefficient pb- We expect higher accuracy for 
estimating pb when d(h) is large. 

4. Estimation procedure and error bounds 

In practice, we do not observe P(i? a = r) for any a and r. For any a, we have a 
set of m independent experiments, whose outcomes are denoted by J2 a,t , 1 < i < m. 
Our setup is that the i? a ' J are independent, identically distributed (i.i.d.) random 
variables, distributed as i? a . 

We then have a natural estimator for P( r a ) = P(R SL = r): 

^ m 



We can of course write p = (p( 



r.a 



4.1. Linear estimator. We apply the inversion formula to the estimated vector 
p. Following Proposition 13 . 21 we can define: 

(6) ^(p^pr^p. 

Put it differently: 



P h 3d(b) 2 r. 



(b)9« P(r,a)P(r,a),b 

(r,a)G(-R"x£") 

and then, the linear estimator obtained by inversion, is 
(7) P= P h<Th - 

b£M n 

The next result gives asymptotic properties of the estimator p of p. 

Proposition 4.1. The estimator p of p, defined in (0) has the following properties: 

(1) it is unbiased, that is E[/5] = p: 

(2) it has variance bounded as follows 

1 



Var{p b ) < 



3 d ( b )4"m' 
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(3) for any e > 0, 



Note again that the accuracy for estimating pb is higher when d(h) is large. 
Indeed, in this case more measurements bring partial information on pt>- 

The concentration inequality gives a bound on the norm \\p — p\\ which is valid 
with high probability. The bound we obtain above depends on log(2 n ), which is 
expected as 4 n — 1 is the total number of parameters of a full rank system. This 
factor appears in the matrix Hoeffding inequality that we use in order to prove this 
bound. 

4.2. Rank penalized estimator. We investigate low-rank estimates of p defined 
in J7]). From now on, we follow closely the results in |7J which were obtained for 
a matrix regression model, with some differences as our model is different. Let us, 
for a positive real value v study the estimator: 

* Til * 1 1 2 

(8) p u = argmin — p|| F + v ■ rank(i?) , 

where the minimum is taken over all Hermitian matrices R. In order to compute 
the solution of this optimization program, we may write it in a more convenient 
form since 



~ l 2 

(9) min \\R — p\\ p + v ■ rank(i?) 



mm mm 

k R:rank(R)=k 



In -II 2 , ' 

\H — p „ + V ■ K 



An efficient algorithm is available to solve the minimization program Q as a 
spectral-based decomposition algorithm provided in [18J. Let us denote by Rk 



the matrix such that \\Rk — p\\j? = rnin/j. :rank (/j.) =fc ||i? — + v ■ k 
projection of the linear estimator on the space of matrices with fixed (given) rank 
k. Our procedure selects automatically out of data the rank k. We see in the sequel 
that the estimators R^ and p v actually coincide. 

We study the statistical performance from a numerical point of view later on. 

Theorem 4.2. For any > put c(6) — 1 + 2/9. We have on the event \v > 
(1 + 61)11,5 -p|| 2 } that 

WPu - P\\ 2 F < min J c 2 {6) ^(p) + 2c{6)vk 1 , 

[ j>k J 

where \j(p) for j = 1, . . . , 2™ are the eigenvalues of p ordered decreasingly. 

Note that, if rank(p) = d, for some d between 1 and 2™, then the previous 
inequality becomes 

WPv-PWf < 2c(6)vd. 

Let us study the choice of v in Theorem 14.21 such that the probability of the event 
{v > (1 + 9)\\p — p\\ 2 } is small. By putting together the previous theorem and 
Proposition 14. II we get the following result: 



This is a 
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Corollary 4.3. For any 6 > put c(6) = 1 + 2/9 and for some small e > choose 

'fy nlog(2)-log( £ ) 
3, 



v{6, e) = 32(1 + ! 
Then, we have 



\\Pu{6,s) ~ P\\l < mm I c 2 (6) J2 A|(P) + M0>k I , 

[ j>k J 

with probability larger than 1 — e. 

Again, if the true rank of the underlying system is d 7 we can write that, for any 
6 > and for some small e > 0: 

'4\" nlog(2)-log( £ ) 
3J m 

with probability larger than 1 — e. If we renormalize the left-hand side with respect 
to the dimension, note that the bound becomes: 



\Pu-p\\% < 64c(0)(l + B)d 



ill^-,*<640( W l + «»)4? V *" 1Og<2) - 1Og(e) 



The next result will state properties of fc, the rank of the final estimator p v . 

Corollary 4.4. // there exists k such that Xk(p) > (1 + o~)\/v and Xk+i{p) < 
(1 — 5)\/i> for some S in (0, 1], then 

P(k = k) > 1 -F{\\p-p\\ > S^). 

From an asymptotic point of view, this corollary means that, if d is the rank 
of the underlying matrix p, then our procedure is consistent in finding the rank. 
Indeed, as y/v is an upper bound of the norm — p\\, it tends to asymptotically 
and therefore the assumptions of the previous corollary will be checked for k = d. 
With a finite sample, we deduce from the previous result that k actually evaluates 
the first eigenvalue which is above a threshold related to the largest eigenvalue of 
the noise p — p. 

5. Numerical performance of the procedure 

In this section we implement an efficient procedure to solve the optimization 
problem © from the previous section. We are interested in two aspects of the 
method: its ability to select the rank correctly and the correct choice of the penalty. 
First, we explore the penalized procedure on synthetic data and tune the parameter 
v conveniently. In this way, we evaluate the performance of the linear estimator 
and of the rank selector. We then apply the method on real data sets. 

The algorithm for solving © is given in [15]. We adapt it to our context and 
obtain the simple procedure. 
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Algorithm: 

Inputs: The linear estimator p and a positive value of the tuning parameter v 
Outputs: An estimation k of the rank and an approximation Rt of the state matrix. 

Step 1. Compute the eigenvectors V = [v\, . . . , Dsn] corresponding to the eigenval- 
ues of the matrix p*p sorted in decreasing order. 
Step 2. Let U = pV. 

Step 3. For k — 1, . . . , 2™, let Vu and £4 be the restrictions to their k first columns 

of V and U, respectively. 
Step 4. For k = 1, . . . , 2™, compute the estimators Rk — UkV k * . 
Step 5. Compute the final solution R kl where, for a given positive value v, k is 

defined as the minimizer in k over {1, . . . , 2™} of 



Rk- P 



2 



v ■ k. 



The constant k in the above procedure plays the role of the rank and then Rk 
is the best approximation of p with a matrix of rank k. As a consequence, this 
approach provides an estimation of both of the matrix p and of its rank d by R k 
and k, respectively. 

Obviously, this solution is highly linked to the value of the tuning parameter v. 
Before dealing with how to calibrate this parameter, let us present a property that 
should help us reduce the computational cost of the method. 

The above algorithm is simple but requires the computation of 2™ matrices in 
Step 3 and Step 4. We present here an alternative which makes possible to com- 
pute only the matrix Rk that corresponds to k = k, and then reduce the storage 
requirements. 

Remember that k is the value of k minimizing the quantity in Step 5 of the above 
algorithm. Let Xi(p) > Aa(p) > ... be the ordered eigenvalues of \J p*p. According 
to Proposition 1], it turns out that k is the largest k such that the eigenvalue 
Afc(p) exceeds the threshold ^Jv: 

(10) k = max{fc : Xk(p) > V^}- 

As a consequence, one can compute the eigenvalues of the matrix \J p*p and set k 
as in (fTTTj) . This value is then used to compute the best solution R k thanks to Step 
1 to Step 4 in the above algorithm, with the major difference that we restrict Step 
3 and Step 4 to only k = k. 

Synthetic Data 

We build artificial state matrices p with a given rank d in {1,...,6}. These 
matrices are 2 n x 2" with n = 4 and 5. To construct such a matrix, we define p as 
the diagonal matrix with its first d diagonal terms equal 1/d, whereas the others 
equal zero. 

We aim at testing how often we select the right rank based on the method 
illustrated in (flO)) as a function of the rank d, and of the number m of repetitions of 
the measurements we have in hand. Our algorithm depends on the tuning parameter 
v. We use and compare two different values of the threshold v. denote by i4p 
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(2) 

and v n the values the parameter v provided in Theorem 14.21 and Corollary 14.31 
respectively. That is, 

(11) W = \\P-P\? and ^=32(1 + ^^(2) 



3/ TO 

As established in Theorem l4.2[ if the tuning parameter v is of order of the parameter 
Vn , the solution of our algorithm is an accurate estimate of p. We emphasize the 
fact that v n is nothing but the estimation error of our linear estimator p. We study 
this error below. On the other hand, the parameter v n 2 ^ is an upper bound of v^ 
that ensures that the accuracy of estimation remains valid with high probability 
(c/. Corollary 14. 3p . The main advantage of v n is that it is completely known by 
the practitioner, which is not the case of v n . 



Rank estimation. Our first goal consists in illustrating the estimation power of 
our method in the selecting the true rank d based on the calibrations of v given 
by (fTTj) . We provide some conclusions on the number of repetitions m of the 
measurements needed to recover the right rank as a function of this rank. Figure O 
illustrates the evolution of the selection power of our method based on v n (blue 

(2) 

stars) on the one hand, and based on v n (green squares) on the other hand. 



FIGURE 1 . Frequency of good selection of the true rank d with respect to 



d, based on 1101 with v - 



(2) 

(green squares) and with v = Uji (blue stars). 



The results are established on 20 repetitions. A value equal to 1 in the y-axis 
means that the method always selects the good rank, whereas means that it 
always fails. Left: m = 50 measurements — Right: m = 100 measurements 



Frequency of good selection o( the true rank d (n=4 and m=50] 




Frequency of good selection of the true rank d (n=4 and m=100) 




True rank d of the target matrix 



True rank d of the target matrix 



Two conclusions can be made. First, the method based on v n is powerful. It 

(2) 

almost always selects the right rank. It outperforms the v n -based algorithm. This 
is an interesting observation. Indeed, v n 2 ^ is an upper bound of v n . It seems that 
this bound is too large and can be used only for particular settings. Note however 
that in the variable selection literature, the calibration of the tuning parameter is a 
major issue and is often fixed by Cross- Validation or other empirical methods, see 
e.g. the simulation study in [20] , We have chosen here to illustrate only the result 
based on our theory and we will provide later an instruction to properly calibrate 
the tuning parameter v. 
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The second conclusion goes in the direction of this instruction. As expected, the 
selection power of the method (based on both Vn and v^) increases when the 
number of repetition m of the measurements increases. Compare the left (m = 50 
repetitions) to the right (m = 100 repetitions) picture in Figure [5] Moreover, up 
to a certain rank, the methods always select the good rank. For larger ranks, they 
perform poorly. For instance with m = 50 (a small number of measurements), we 

(2) 

observe that the algorithm based on v n performs poorly when the rank d > 4, 
whereas the algorithm based on Vn is still excellent. 

Actually, the bad selection when d is large does not mean that the methods perform 
poorly. Indeed our definition of the matrix p implies that the eigenvalues of the 
matrix decrease with d. They equal to 1 jd. Therefore, if y/V is of the same order 
as 1/d, finding the exact rank becomes difficult since this calibration suggests that 
the eigenvalues are of the same order of magnitude as the error. Hence, in such 
situation, our method adapts to the context and finds the effective rank of p. As 
an example, let us consider our study with n — 4, m = 50 and d = 6. Based on 20 
repetitions of the experiment, we obtain a maximal value of z/„ = \\p — p\\ 2 equal 
to 0.132. This value is quite close to 0.167, the value of the eigenvalues of p. This 
explains the fact that our method based on failed in one iteration (among 20) 

(2) 

to find the good rank. In this context z/„ is much larger than 0.167 and then the 
selection of the rank d is not efficient with this calibration and in this setting. 
Let us also mention that we explored numerous experiments with other choices of 
the diagonal matrix p. The same conclusion remains valid. When the error of the 
linear estimator p which is given by = \\p — p\\ 2 is close to the square of the 
smallest eigenvalue of p, finding the exact rank is a difficult task. However, the 
method based on Vn is still good, but fails sometimes. 



FIGURE 2. Evaluation of the operator norm y v„ = ||p — p\\. The results 
are established on 20 repetitions. Left: n = 4, m = 50 repetitions of the 
measurements ; we compare the errors when d takes values betwenn 1 and 6 — 
Center: n = 5, m = 100 ; we compare the errors when d takes values between 
1 and 6 - Right: the rank equals d = 4 and compare the error for m = 50 and 

100. 



Calibration of the tuning parameter v. The quantity v n = \\p — p\\ 2 seems 
to be very important to provide a good estimation of the rank d (or more precisely 
of the effective rank). Then it is interesting to observe how this quantity behaves. 
Figure [5] (Left m = 50 and d = 4, and Center m = 100 and d = 5) illustrates 
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how Vn varies when the rank increases. Except for d = 1, it seems that the value 
of iff is quite stable. These graphics are obtained with particular values of the 
parameters m and d, but similar illustrations can be obtained if these parameters 
change. 

The main observation according to the parameter v is that it decreases with m 
(see Figure [5]- Right) and is actually independent of the rank d (with some strange 

(2) 

behavior when d = 1). This is in accordance with the definition of v)i which is an 
upper bound of v$ . 

Real-data analysis 

In the next paragraph, we propose a 2-steps instruction for practitioners to use 
our method in order to estimate a matrix p (and its rank d) obtained from the data 
R a ' 1 we have in hand with a £ {x, y, z} and i £ {1, . . . , m}. 

Real Data Algorithm: 

Inputs: for any measurement a £ {x, y, z} we observe R a ' 1 , i = 1, . . . , m. 

Outputs: k and R^, estimations of the rank d and p respectively. 

The procedure starts with the linear estimator p and consists in two steps: 

Step A. Use p to simulate repeatedly data with the same parameters n and m as 
the original problem. Use the data to compute synthetic linear estimators and the 
mean operator norm of these estimators. They provide an evaluation of the tuning 
parameter . 

Step B. Find k using (fTU)) and construct R^. 

We have applied the method to real data sets concerning systems of 4 to 6 ions. 
In Figure Owe plot the eigenvalues of the linear estimator and the threshold given 
by the penalty. In each case, the method selects a rank equal to 2, which possibly 
means that the system is not entangled. 



FIGURE 3. Eigenvalues of the linear estimator in increasing order and the 
penalty choice; m = 100 and n = 4, 5 or 6, respectively. 
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6. Appendix 
Proof of Proposition \3. 6 A Actually, we can compute 

(P T P) bl , b2 = J2 II r J I ( a i = ft ij) II r k I(a k = ba, k ). 

In case bi = b 2 = b, we have 

(P T Pkb = e f n r ^= h j) 

(r,a) \j?E b 

-En ^ = ^ = 3rf(b)2n - 

(r,a) j££ b 

In case bi ^ b 2 , we have either Eb 1 — Ey, 2 or Ey yi ^ Eb 2 . If we suppose E\ 3l = E\> 2 , 
I [ rj I(aj = 6i,j) r fc I(a fe = 6 2 ,fe) = 0. 

Indeed, if this is not it means a = bi = b 2 outside the set E^ , that is bi = b 2 
which contradicts our assumption. 

If we suppose E^ 1 ^ Eb 2 , we have either bi ^ b 2 on the set E^ E^ and 
in this case one indicator in the product is bound to be 0, or we have bi ^ b 2 
on the set E^ E^ . In this last case, take jo in the symmetric difference of sets 
E hl AE h2 . Then, 

(P T P) bl ,b 2 - Ell r i I ( a i = b hi) n r k I{a k = b» lk ) 

(r,a) j£E bl k£E b2 

= e n r ( a J = n j ( a fc = b %k) n rj 

(r,a) j£-E bl k^E b2 j£E bl AE b2 

= E ^ E E n = 

>"ioe{-l,l} r?r So a j£E bl 

n n r J=°- 

fc££b 2 jeB bl AB b2 /j 

□ 

Proof of Proposition ^. 1\ It is easy to see that p is an unbiased estimator. We write 
its variance as follows: 



Var(p h ) = 32<J(b)4 



32<i(b)4 
1 



^ E Var ( E ^£ 1 «-'=r P ('.«).b) 

ag£™ i=l / 

J^2 E E m P(r,a)P(r,a).b ~ ™ I E P(r,a)P(r,a),b 



e ^.a) n I ^ = 



3 2<i(b)4« TO 

(r,a)e(W»x£ n ) j££ b 



Finally, let us prove the last point. We will use the following result due to [21j. 
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Theorem 6.1 (Matrix Hoeffding's inequality [H]). Let X\, X p be indepen- 
dent centered self-adjoint random matrices with values in C dxd , and let us as- 
sume that there are deterministic self-adjoint matrices A\, A p such that, for all 
i G {1, A\ — Xf is a.s. nonnegative. Then, for all t > 0, 



> t < dexp 



(— 

\8a 2 



where o-2 = \\J2l =1 Al\\. 
We have: 

P~P = ^2(Pb ~ Pb)o- b 



EEE 

bra 



EEEE 

b r a i 
EE^- 



(r,a),b 



3<i(b)2r. 



(Pr,a - _Pr,a)0b 



*(r,a),b 
3 d(b) 2 n TO 



(l R i,a =r -p r ^)a h 



where 



E-v-,,: EE 



b r 



" (r,a),b 
3<i(b)2™ 



(lfli,a=r -Pr,a)o-b- 



Note that the Xi, a , for i £ {l,...,m} and a € are independent self-adjoint 
centered random matrices. Moreover, we have: 



l*i,a|| = 



E E 3<i(b r )2»m (lfli,a = r " Pr ' a)ab 

b r 



^EE 

b 

^E 



(r,a),b 



3d(b)2r. 

b r 

3 d(b) 2 ™ TO 



Hfl*.«=r -Pr,a\ lkb| 



< 



2 2 -i— r 2 - " 

rim 5^ 3<2(b) II- ~^ a j= b j — 2 n m 



2™m ^ 3 d ( 

b j$E: 



-E 



2™m 



n\ 1 



I J 3* 2 n r 



-0 b such that 

d(b) = t 
Vj (g Eb.a^ = 

2/2 
m V 3 



1 

3^ 



This proves that Af ^ — Xf^ is nonnegative where Aj, a = ^ (1)™ ^- So we can a PPly 
Theorem 16. 11 we have: 
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and so 

-t 2 m /3 



F{\\p-p\\>t) 
We choose t such that 

this leads to: 



> t\ < 2™ exp 



32 V 4 



e = 2™ exp' ' "' ' " 



32 \4 




nlog(2)-log(e) 



< e. 



m 



a 

Proof of Theorem \4-2\ From the definition of our estimator, we have, for any 
Hermitian, positive semi-definite matrix R, 

1 1 * 1 1 2 ~ ii ~ 1 1 2 

\\p~v - p\\ F + ^rank(p„) < \\R — p\\ p + z^rank(i?). 
We deduce that 

\\pv-p\\ 2 F < \\R-p\\l + 2Tr{{p-p)*(R-p v )) + v{r&i&{R)-mnk{p-„)) 
< \\R-p\\ F + 2vTaak(R) + 2\\p-p\\ x p-pvHi 
— ^(rank(i?) + rank(p„)). 
Further on, we have 

\\R-Puh < (raiMR) +™nk(M) 1/2 \\R- P»\\f 

< (rank(-R) + rank( / 5 I/ )) 1 /2(|| / o _ p v \\ F + \\R- p\\ F ) 

We apply two times the inequality 2A ■ B < eA 2 + e~ 1 B 2 for any real numbers A, B 
and e > 0. We actually use e = 1 + 0/2 and e = 8/2, respectively, and get 

1 1 ~ 1 1 2 2 

\\p v — p < \\R — p\\ F + 2wank(i?) — z/(rank(i?) + rank(p l/ )) 



-(1 + 6>)(rank(i?) + rank(^))||p - p 



2 



Hl + ^Wp^pWl + i^WR-pWl- 
By rearranging the previous terms, we get that for any Hermitian matrix R 

\\p* -P\\p< c 2 (Q)\\R- P\\f + 2e(0>rank( J R), 
provided that v > (1 + 0)\\p — p\\ 2 . By following [7], the least possible value for 
\\R — p\\ F is Ylj>k ^ tne ma trices R have rank k. Moreover, this value is 

obviously attained by the projection of p on the space of the eigenvectors associated 
to the k largest eigenvalues. This helps us conclude the proof of the theorem. 

□ 

Proof of Corollary \4-4\ Recall that k is the largest k such that A& (p) > ^/v. We 
have 

P{k £k) = P(A*(p) < or A fc +i(p) > y/v). 
Now, Afe(p) < Afe(p) + ||p- p\\ and A fc+ i(p) > A fe+ i(p) - ||p- p\\. Thus, 

P(k^k) <P(||p-p|| >min{A fe (p)-V^,V^-A fe +i(p)}) 
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and this is smaller than P(||p— p\\ > by the assumptions of the Corollary. □ 

Acknowledgements: We are most grateful to Madalin Guta and to Thomas 
Monz for useful discussion and for providing us the experimental data used in this 
manuscript. 
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